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Abstract 

This note is mainly to point out, if needed, that uncertainty about models 
and their parameters has little to do with a ‘paradox’. The proposed ‘solu- 
tion’ is to formulate practical questions instead of seeking refuge into abstract 
principles. (And, in order to be concrete, some details on how to calculate the 
probability density functions of the chord lengths are provided, together with 
some comments on simulations and an appendix on the inferential aspects of 
the problem.) 


“On trace au hasard une corde dans un cercle. 
Quelle est la probability pour qu’elle soit plus petite 
que le cote du triangle equilateral inscrit? 

Entre ces trois reponses, quelle est la veritable? 
Aucune des trois n’est fausse, aucune n’est exacte, 

la question est mal posee.” 
(Joseph Bertrand) 

“ Probability is either referred to real cases 

or it is nothing ” 
(Bruno de Finetti) 

“As far as the laws of mathematics refer to reality, 

they are not certain, 
and as far as they are certain, 
they do not refer to reality.” 
(Albert Einstein) 


*Note based on lectures to PhD students in Rome. The Android app mentioned in the text is 
available at http : / /www . romal . infn. it/" dagos/prob+stat .html#bpr. 


1 Introduction 


The question asked by Joseph Bertrand in his 1889 book Calcul des probabilites [I] is 
about the probability that a chord drawn ‘at random’ is smaller that the side of the 
equilateral triangle inscribed in the same circle (see e.g. 0). Obviously the question 
can be restated asking about the probability that the chord will be larger or smaller 
than the radius, or whatever segment you like, upper to the diameter (for which the 
solution is trivially 100%). The reason of the original choice of the side of such a 
triangle is that the calculation is particularly easy, under the hypotheses Bertrand 
originally considered, as we shall see. 

The question can be restated in more general terms, i.e. that of finding the prob- 
ability distribution of the length l of a chord. Indeed, as well known, our uncertainty 
about the value a continuous variable can assume can be described by a probability 
density function, hereafter ‘pdf’, f(l). This should be written, more precisely, as 
/(/ 1 I s (t)), where I s (t) is the Information available to the subject s at the time t. In 
fact, 


“Since the knowledge may be different with different persons or with the 
same person at different times, they may anticipate the same event with 
more or less confidence, and thus different numerical probabilities may be 
attached to the same event.” [3] 

And, hence, probability is always conditional probability, as again well stated by 
Schroclinger [3], 

“Thus whenever we speak loosely of ‘the probability of an event, 1 it is 
always to be understood: probability with regard to a certain given state 
of knowledge. ” 

These quotes, which are 100% in tune with common sense, definitely rule out the 
use of the appellative of ‘paradox’ for the problem of the chords. In other words, 
Bertrand’s ‘paradox’ belongs to a completely different class than e.g. Bertrand Rus- 
sell’s barber paradox. Absurd is instead the positions of those who maintain that the 
problem should have a unique solution once it is “well posed” 1 ®, or that they have 
found the “conclusive answer” [5]. 

In my point of view the question proposed by Bertrand can be only answered if 
framed in a given contest and ‘asked’ somehow, either to human beings, or - and 
hence the quote marks - to Nature by performing suitable experiments (but making 
a particular simulation, of the kind of that proposed in Ref. [5], is the same as 
asking human beings - and even making an experiment the result will depend on the 
experimental setup!). 

For example we can ask suddenly students, without any apparent reason (for 
them), to draw a chord in a circle printed on a sheet of paper. And to give more 
sense (and fun) to the ‘experiment’ we can make a bet among us on the resulting 
length (the bet could be even more detailed, concerning for example the orientation 
of the chord - for example it never happened to me that a student drawn a vertical 
one, but perhaps Japanese students might have higher tendency to draw segments 
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top down!). Or we can ask students to write, with the their preferred programming 
language and plotting package, a ‘random chord generator’. In this case our bet will 
be about the length that will result from a certain extraction, e.g. the first, or the 
100th (if we were not informed about all or some of the previous 99 results, because 
this information might change our odds about the outcome of the following ones, as 
we shall see in Appendix C). 

Indeed I have done experiments of this kind since several years, and I have formed 
my opinion on how students react, depending on the class they attend and how they 
are skilled in mathematics, and . . . even in games (yes! in the answer there is even 
a flavor of game theory, because smart students unavoidable try to guess the reason 
of the question and try to surprise you!) For example unsophisticated students draw 
‘typical’ chords, of the kind you can get searching for the keywords “chord geometry” 
in Google Images [] [Hence, for example, a possible real ‘well posed’ question could 
be the following: “what is the length of the chord (in units of the radius) that will 
appear in e.g. the 27-th (from left to right, top to bottom) image returned by the 
search engine?”] When instead I propose the question to students of advanced years 

1 have quite some expectation that one or more of them will draw a diameter (just a 
maximum chord) or even a tiny segment almost tangent to the circumference. 

Essentially this is all what I have to say about this so called ‘paradox’. The 
rest of the note has been written for didactic purposes in order to show how to 
evaluate the probability distributions of interest and how to make the simulations. 
The paper is supplemented by three appendices. Appendix A is a kind of extra 
exercise on transformation of variables. Appendix B show a simple way to write 
an ‘infinite number’ of chord generators (indeed a generator characterized by two 
hyperparameters ) . Appendix C is finally an inferential variation of Bertrand problem, 
in which we use ‘experimental data’ to guess the generator used and to predict the 
length of other chords produced with the same generator. 

2 Basic (‘classic’) geometric solutions 

Let us now start going through the ‘classical’ solutions of the problem, i.e. those 
analyzed by Bertrand and which typically appear in the literature and on the web. 
The adjective geometric is to distinguish then from a more ‘physical’ one, based on 
a kind of realistic game that, as we shall see in section 0 

2.1 Endpoints uniformly chosen on the circumference 

A way to draw ‘at random’ a chord is to choose two points on the circumference and 
to join them with a segment. If we indicate the first point with A, corresponding to 
a vertex of the equilateral triangle, as shown in Fig. [U the chords smaller that the 
side of the triangle are those with the other end either in the arc between A and B 
or in that between A and C. The resulting probability is thus simply 2/3. 

1 https : //www. google . it/ search?q=chord+geometry&source=lnms&tbm=isch 
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Figure 1: Left: circle with inscribed equilateral triangle and chords with one end in A and the 
other distributed uniformly around the circumference. Right: geometric construction to show 
how to evaluate the length l from 0 2 , with 0 < 9 2 tt (see text for details). 


A more complete information about our beliefs that the length falls in any given 
interval is provided by the pdf f{l | .Mi), where Adi stands for ‘Model V . Since there 
is a one to one correspondence between a point on the circle and the angle between 
the radius to that point and the a:-axis (according to usual trigonometry convention), 
we can turn our extraction method into two angles, 6\ and 62, uniformly distributed 
between 0 and 2tt. If we are only interested in the length of the chords and not 
in their position inside the circle we can fix 9\ at 7 r and consider 0 2 in the interval 
between 0 and n. The corresponding chord will have a length of (see right plot in 
Fig. HD 


/ = yj (R + R cos 0 2 ) 2 + R 2 sin 2 0 2 
= R yf{ 1 + cos 6 2 )' 2 + sin 2 0 2 

= R ^2 + 2 cos 0 2 , (1) 

or, more conveniently, the normalized length A will be 

A = -^ = ^2 + 2 008 02. (2) 

The problem is thus how to calculate the pdf0 /(A | Adi) from 

f(0 2 \Mi) — — (0 < 9 2 < 7r) . (3) 

7 r 

2 Obviously, the pdf’s f(02 | Adi) and /(A | Mi) are usually expressed by different mathematical 
functions, a point very clear among physicists. Mathematics oriented guys like to clarify it, thus 
writing e.g. /a(A | Adi), fe 2 (0 2 | Adi), and so on. I will add a proper subscript only if it is not clear 
from the context what is what. 
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( 4 ) 


We shall use the general ruled 


/ +oo 

&{y-g(x)) ■ f(x)dx, 

-oo 


in which Y is related to the X by Y = g(X), with g() a generic function and <5() the 
Dirac delta. In our case we have then 

/(A | Mi) = J*6 (a-^2 + 2cos0 2 ) ■ f(Q 2 \M 1 )de 2 . (5) 


Eq. (J5]) has the very simple interpretation of ‘summing up’ all infinitesimal proba- 
bilities ‘/(02 | -Adi) d 6 2 that contribute to the ‘same value’ of A (the quote marks are 
due to the fact that we are dealing with continuous quantities and hence we have 
use the rules of calculus). This interpretation is very useful to estimate /(A | A^i) by 
simulation: extract values of 0 2 according to /(0 2 |-Mi); for each value of 0 2 calcu- 
late the corresponding A; summarize the result by suitable statistical indicators and 
visualize it with an histogram. 

Making use of the properties of the Dirac delta and taking into account that A(0 2 ) 

3 An alternative way is to use the ‘text book’ transformation rule, valid for a monotonic function 
y = g{x) that relates the generic variable X to the variable Y : 


fv{y) 


fx(g l (y)) 



(d) 


which can be derived in the following way for the general variables X and Y [capital letters indicates 
the variable, small letters the possible values - now it becomes important to make clear the different 
pdf’s and we shall then use the notation f x () and ,/V()]: 


g'{x) > 0 


If <?() is non- decreasing in the range of X we have 


Fy(v) = P(X <y) = P(g(X) <y) = P( X < g~\y)) = F x (g~\y)) . 


Making use of the rules of calculus we have then 

fv{y) = ^ Fv{y ) = ^^A'(ff _ 1 (y)) = fx{{g^{y)) ■ 


g'{x) < 0 


If, instead, g() is non-increasing we have 


F v (y) = P(Y <y) = P(g(X) <y) = P(X > g~\y )) = 1 - F x (g-\y )) , 


and then 


fy(y) 


yr F Y(y) 

ay 



F x(g \y)) 


-fx((g 1 {y))~9 \y) = fx({g \y))- 
ay 



where the factorization in the last step is due to the fact that fx{) cannot be negative, and 
for this reason the absolute value in (A) is only on the second factor. 

Equation (A) takes then into account the two possibilities and, let us stress once more, it is valid 
for monotonic transformations. We shall use it, to double check our results in footnotes [5j [G] and [T] 
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decreases monotonically we 


f(X\Mi) = £ 6 (A- ^2 + 2 cos 9 2 £^- 

r 5(02 - 02 ) 


d9o 


— \/2 + 2 cos 0 2 

where 9 \ is the solution of the equation 




d9o 


7T 


( 6 ) 

(7) 


A — J 2 + 2 cos 9 2 = 0, 


that is 


0o = arccos 


'A 2 


( 8 ) 


(9) 


Being the derivative in Eq. 0 

(s'W 


2 + 2 cos 0 2 ) 


sin 0o 


e 2 =+* 


\/2 + 2 cos 0 2 

/ A x 2 




- *- 2 . 


( 10 ) 


(ii) 


4 In the step from Eq. © to Eq. 0 we are making use of the famous property (at least among 
physicists) of the Dirac delta 


i(«(*» = 

V \9 ( X i)\ 


where Xi are the real roots of g(x). In our case we have a single root, which we write as x*, and 
hence we get 


%(+)) = 


S(x — x*) 


\g'{x*)\ 

If we apply it to the general transformation rule 0 we obtain 


fy(y) = 


$(y- g( x )) ■ fx(x)dx 

: 

' S(x — x*) 


Loo \a'{x*)\ 

l 


fx (x) dx 




\g'(x*)\ 

in which we recognize Eq. (A) of footnote^ if we note that fx(x*) is fx(g~ 1 {y)) and the derivative 
of g(x) w.r.t. x calculated in x* is the inverse of the derivative of the inverse function g~ l (w.r.t. 
y \ ) calculated in y = g(x*) [for the latter observation just think at the Leibniz notation dy/dx = 
1 /(dx/dy)}. 

Anyway, in order to avoid confusion the denominator in Eq. 0 has been written in the most 
unambiguous way. 
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the pdf of interest is[l 

.HMMJ = -■ 1 (0 < A < 2) , (12) 

n i/HW 

from which we can also calculate the cumulative function 

F(X I M\) = [ X /(A' I M\) dX' = — ■ arcsin f — | , (13) 

Jo 7T \2 J 

both shown in Fig. [2j 

We can finally check the probability of interest, and also calculate the probability 
of a chord to be smaller than the radius of the circle: 

P { A < V3\Mi) = F{V3\Mi) 

P{X<1\M 1 ) = F{1\M 1 ) 

Simulations of chords with this methods are reported in the top left plot of Figures 
[HJ1I251 at the end of the paper: Fig. [TH] shows a sample of random chords; Fig. I2U 
the position of their center. The distribution of the lengths in units of R are shown 
in the histograms of Fig. [20] while those of Fig. [22] show the distribution of the 
distance of the chords from the center of the circle, about which we shall say more in 
Appendix A. Finally, Fig. [23] is like Fig. [T9l but ‘without rotation’, the meaning of 
this espression becoming clearer as we go through the various methods. In the case 
of the first method this corresponds to the way we have followed to derive /(A | Adi) 
in this subsection, that is fixing 9\ at 7r and extracting 9o between 0 and 7 r. 


= — • arcsm 

7 T 


7T 



2 7T 
7T 3 

7 T 


2 

3 


= — • arcsm — - 


7T 6 


5 As an exercise, we can check the result with that obtainable using the ‘text book’ transformation 
rule described in footnote [3] In our case, using the general symbol g() introduced there, we have 
A = g(9 2 ) = y/2 + 2 cos O 2 and 62 = g~ 1 ( A) = arccos (A 2 /2 — l). Applying Eq. (A) of footnote [3] we 
have 


/a (A) 


fe 2 {g 1 (A)) 




1 

7T 

1 

7T 


— arccos (A 2 /2 — l)) 
d\ v ' 


v/l-(AV2-l) 2 


also yielding Eq. (fT2l) . (The minus sign resulting from the derivation is because A decreases as 62 
increases, as clear from Fig[T|) 
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Figure 2: Probability distribution of A = l/R of the chords generated with Method 1. The 
dashed vertical line indicates A = \/3. 
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Figure 3: Construction of a chord orthogonal to a radius and distant r from the center of the 
circle. 

2.2 Chords orthogonal to a radius, with center uniformly 
distributed along it 

The second ‘classical’ algorithm consists in choosing chords orthogonal to a radius 
with its center uniformly distributed along it. As we easily see from Fig. [31 the 
condition for a chord to be smaller than the side of the triangle (lx) is the same as 
requiring that its distance from the center of the circle, indicated by r in the figure, 
is above R/2. That is 

P(l < h | M 2 ) = P(r > R/2\M 2 ) = (14) 

£ 

Let us repeat the exercise of evaluating the probability distribution of the lengths 
of the chords obtained with this method. Using p to indicate r/R, in analogy to 
X = l / R, we have (see figure) 

A = 2^1-p 2 , (15) 

with 

f(p\M 2 ) = 1 (0 < p < 1) . (16) 

Then, the pdf of interest will be 

/(A | M 2 ) = 


given by 

r i 


5 A-2U1-P 2 -ldp 


S(p - />•) 


(A(A - 2 VT=7)) 
1 


dp , 


p=p 


Ip/p - l>‘ 


(17) 

(18) 

( 19 ) 
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with 


P * = \/l-( A /2) 2 - (20) 

The pdf and the cumulative distribution of interest are then[§ 

f(X\M 2 ) = 7 A (0 < A < 2) (21) 

4^1 - (A/2) 2 

F(A|7W 2 ) = 1 - ^/l - (A/2) 2 , (22) 

plotted in Fig. [4] and from which we can calculate the probabilities of interest: 

F{V3\M 2 ) 

F(1|M 2 ) 

Simulations of chords with this methods are reported in the plots of Figs. ITTilfTTl 

2.3 Center of chords uniformly chosen inside the circle, with 
chords orthogonal to radius 

The third method is a variant of the second, in which the centers of the chord, instead 
of being generated uniformly along a radius, are generated uniformly inside the circle. 
The centers of the chords fall inside the circle of radius R/2 with probability 1/4, 
and than (see again Fig. ED 

P(1<It\M 3 ) = P(r>^\M 3 ) = ^. (25) 

Let us repeat once more the exercise of calculating the probability distribution of A. 
The difference with respect to the previous case is that now the pdf of the center of 
the chords is proportional to r, since dP oc (2nr)dr (see Fig. ED- The pdf of p is then, 
after normalization, 


1 

2 

i - yb 3 13.4%. 


(23) 

(24) 


f(p\M 3 ) = 2p (0 < /> < 1) . 
and the equivalent of Eqs. (fTTft - ffTSft and sequel are now 

/(A M,l = £s p^-ipip 


(26) 


(27) 


6 Let us repeat the exercise of using Eq. (A) of footnote [3] also in this case, starting now from 
A = 2y/l - p 2 = g(p), p = y/1 - (A/2) 2 = g- x ( A) and f P (p ) = 1: 


/a (A) = fp(g 1 (A)) 

A 




4Vl-(A/2) 2 


that is precisely Eq. dm 
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X 

Figure 4: Probability distribution of A = l/R of the chords generated with Method 2. The 
dashed vertical line indicates A = \/3. 
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Figure 5: As Fig. [3] but with the annulus of infinitesimal width dr drawn at r to show that in 
Model 3 the infinitesimal probability dP that the distance of a chord from the center is between 

r and r + dr is given by dP = dF(r) = 2irrdr / (ttR 2 ) = (2 r/R 2 ) dr, and hence dF(p ) = 2 pdp, 
or, in our notation, f(p \ M.%) = 2 p, with p = r/R. 


5(p - p*) ■ 2 p 


(A(A - 2v/T^?)) 

2 p* 


dp. 


p=p 


2 P'/Jl 


= VI 




*2 




with the same p* of Eq. (l20]i , thus leading to0 


f(\\M 3 ) = {l - (1 - (W) = \ . 
The result can be then summarized as 

/'A .M:,,) = t (0 < A < 2) 

F(A|A1 3 ) = j, 

from which we obtain 

F(^3IM 3 ) = | 

mix 3 ) = t 


(28) 

(29) 

(30) 

(31) 

(32) 

(33) 

(34) 


7 Let us repeat once more the exercise done in footnotes [5] and [Gl since in this case the starting 
pdf is not a constant, being fp(p) = 2 p. All the rest is like in footnote [6] Here it is: 


/a(A) 


fp{g~\ A)) 




2v/l - (A/2) 2 


VI -(A/2) 2 


A 

2 ' 
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Figure 6: Probability distribution of A = l/R of the chords generated with Method 3. The 
dashed vertical line indicates A = y/3. 
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Figure 7: A way to draw chords with lengths uniformly distribute between 0 and 2 R. 


Simulations of chords with this methods are reported in the plots of Figs. [IM2^1 


3 Some “ruler and compass” methods 

Let us now see another two ‘geometric’ methods in which chords a drawn by opening 
at random an ideal compass (‘ideal’ because the minimum distance between the 
end points is taken to be zero) from a reference point along the circumference. (By 
‘opening at random’ we mean that the endpoints of the compass will define, uniformly, 
segments up to the diameter. Changing the opening method we can then define two 
classes of chord generators.) 

3.1 Chords with length uniformly distributed between 0 and 

2 R 

It is curious that the method of simply taking chords with length uniformly dis- 
tributed up to the length of the diameter is usually not taken into account, although 
the idea is not bizarre at all. Indeed this could even be the natural procedure to 
someone used to operate in classical geometry with ruler and compass: place the 
needle of the compass in a point of the circumference (e.g. A in Fig. [7]) to define one 
endpoint of the chord; then place the pencil lead along the diameter impinging the 
circumference in A; finally rotate the compass in either direction (anticlockwise in 
the figure) and find the second endpoint of the chord. 

The probability distribution of A (see Fig. [8]) as well as the probabilities of interests 
are in this case really trivial: 

/(A | MO = \ (0 < A < 2) (35) 
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I 

Figure 8: Probability distribution of A = l/R of the chords generated with Method 4. The 
dashed vertical line indicates A = \/3. 
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Figure 9: An alternative way to drawn chords with ruler and compass. Left: construction of 
the chord. Right: relation between the / and V (both in units of the radius). 


F(A|-M 4 ) = 

F(a/3|M 4 ) = 

F(l\M±) = 

Simulations of chords with this methods are reported in the plots of Figs. 1T91I231 
Nevertheless, although the results from this extraction model are very easy, if we 
ask someone to make a computer program to really draw cords ‘at random’, 1 would 
not bet 8.7 to 1.3 (that is ~ \/3/2 to 1 — \/3/2) that a chord will be smaller than the 
side of the triangle! This point will become clear in section 0 


A 

2 

V3 

2 

1 

2 


(36) 

(37) 

(38) 


3.2 A variant of Method 4 

When using ruler and compass it is almost automatic that, after having drawn an 
arc in one direction to intercept the circumference, one rotates the tool the other 
direction, thus identifying the two points E and D of Fig. [9) which then become the 
natural endpoints of a chord (the reader understands that at this point the use of the 
adjective ‘natural’ is at limit of being sarcastic). The length V of this chord is related 
to the segment l defhiecH by the opening of the compass by l' = 2 1 sj 1 — ( 1/2R ) 2 , or, 

8 The points ADF define a rectangular triangle. The length of the segment AG is then equal to 
l 2 /2R, while the square of the length of DG is equal to the product of the lengths of AG and of 
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in units of the radius, 


A' = 2A/L - (A/2) 2 . (39) 

This relation is shown in the right plot of Fig. 0 from which we can calculate the 
usual probability that A' is smaller that 73. This is equal to the probability that l 
is smaller than 1, that is 1/2, plus the probability that it is larger that 73, that is 
(2 — a/3)/2. The result is then (3 — \/ 3)/2 ~ 63.4%, that do not correspond to none 
of the previous methods. Similarly, the probability that the chord is smaller than the 

radius, or that A' < 1, is equal to the probability that A is smaller than \J 2 — 77 

plus the probability that it is larger than 7 2 + a/3, that is[H 


>J2- VS 2-^/2 + VS 
2 2 


1 


-^= « 29.3 % . 

72 


(40) 


(This second calculation is important in order to double check the result that we shall 
get later, being the resulting formulae not very ‘pretty’.) 

Also in this case it is possible to arrive to a closed expression of the pdf. We 
do here the exercise mainly to show a case in which the property of the Dirac delta 
needed to make the transformation involves more than one root, as evident from the 
non-monotonic relation shown in Fig. |9j Here are the details: 

/(A' | Ms) = (A'-2A v / l-(A/2) 2 ) -idA 

r 2 ( J(A-A;) | j( A-A|) \ 1 ^ 

Jo Vlix^V 1 “ ( A /2) 2 )|a=a* ^(-2A^-(A/2) 2 )U =a J 2 

with 

At = \f‘ 2-74-A'^ (41) 

At = V 2 + 74 - A^ . (42) 

The result is, continuing to indicate in this subsection the chord of interest with A', 


/(A' | M 5 ) 


72 + 74^2 + ^2 -74^7^ 

474 - A' 2 


(43) 


GF. It follows then 

/' = 2 x hyp/2R- (2R-P/2R). 


9 It is remarkable that \/2 + — y/2 — \/3 = y/2, an identity used to rewrite Eq. ( 141 ) 1 ) in a more 


compact form. 


17 


The cumulative distribution can be obtained making the usual integral, although 
the integrand is in this case particularly i lastvJ^I. In reality the cumulative distri- 
bution can be calculated extending the reasoning followed above to calculate the 
probability that the chord is smaller than the side of the triangle and than the ra- 
dius. In fact, remembering the transformation from A to A' plotted in Fig. [9] and 
using capital letters to distinguish the variables from their values, we have 

P( A' < A') = P (a < \[l - VVA?) + P < A < 2) (44) 

= 2-^2+V^a^ 

2 2 ’ K ’ 

from which it follows 

F(X'\M 5 ) = ^ (2 + V2- ^2 + V4^A^) , (46) 

that, derivated, reproduces Eq. (l43]h Pdf and cumulative functions are shown in 
Fig. (TUJ The two probabilities of interest are then 


P(\'<V3\M 5 ) = 

3-^3 

« 0.634 

2 

(47) 

P(\'<1\M 5 ) = 

1 - -= « 0.293. 

V2 

(48) 


Results based on simulations are shown in Figs. [T91I231 

3.3 Summary of the results obtained by the various methods 

The results obtained by the ‘geometric’ methods At 1- At 5 are summarized in Tab. |Tj 
and in Fig. [HI together with the more physical one Ate, that will be discussed in 
section [6] I11 the table we have also added, for completeness, the expected values 
and the standard deviations of the probability distributions. More details, obtained 
from simulations, are shown in Figs. ITijIfTTl The dashed lines of Fig. [IT] represent 
instead the arithmetic averages of the other functions, whose motivation will be clear 
in Appendix 3. 


10 Wolfram Matliematica provides the following result: 

F(X'\M 5 ) = i ^2A' - 2 ^2 - ^4 - A' 2 - ^/(4 - A' 2 ) • (2 - \/4 - A' 2 ) 

+2^2+ ^/(4 — A' 2 ) • (2 + \/4 — A' 2 )^ , 

that, although apparently quite different from Eq. (PI) . can be checked to be numerically equivalent 
to it. 
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Figure 10: Probability distribution of A = l/R of the chords generated with Method 5. The 
dashed vertical line indicates A = \/3. 
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Table 1: Summary of the results obtained by the different 'random' drawing models of the 
chords 


Model 

M x 

M 2 

M 3 

M 4 

m 5 

M 6 


/(A) 

F(A) 

E(A) 

rr(A) 

P(A < 73) 

r>(A<i) 

l 

ttV1-(A/2) 2 

- • arcsin A 

7T Z 

4/7T 

sj2 - 16/tt 2 

2/3 

1/3 



(~ 1-27) 

(« 0.62) 
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X 

Figure 11: Probability density functions and cumulative function of the lengths of the chords 
in units of the radius of the circle for the extraction models considered in the text (the order 
of the models in the legend corresponds to the decreasing order of F(1 1 M t ) and /( 0.1 1 Mi)). 
The dashed lines correspond to the ‘averages’, on which we shall come back in Appendix C. 
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4 A chord length generator (and its implementa- 
tion in R) 

In reality, if we are just interested in writing a computer program which generates the 
length of the chords ‘at random’, using one of the five methods we have seen, there is 
no need to go through all the steps of the operational descriptions of the algorithms. 
We can just use the probability distributions, summarized in table |T] In order to do 
that we need a premise, highlighted in the following subsection. 


4.1 A curious transformation and its practical importance 


Imagine to have a generic continuous variable X whose uncertainty is described by 
the pdf fx(x) and the cumulative distribution F x (x) (in this subsection we shall 
use the more precise notation introduced in footnote [3]) . We might think at a new 
variable Y, related to X by Y = F x (X), that is 

V = g(x) = F x (x), (49) 


written in a way to stress that in this case F x () plays now the role of the generic 
mathematical function g(), independently of its probabilistic meaning. 

Making use of Eq. [4] the pdf of Y can be evaluated as 

/ +oo 

5 (y- F x (x ) ) ■ fx {x) dx . (50) 

-OO 


Being F x {) monotonic and not-decreasing with derivative equal to fx(), we have then 


fr(y) 



■ fx(x) dx 


fx(x*) 

fx(x*) 


1 . 


(51) 


This is a great result, that we double check following the same reasoning used in 
footnote EJ 


F v (y) = P(Y < y) 

fv(y ) = y- F Y {y) 
dy 


P{Y < Fx(x)) = P(X < F x \y )) = F x ( F x \y )) = y 



1 . 


Independently of the pdf fx(), the variable Y defined in this way is uniformly dis- 
tributed between 0 and 1. It follows that the pdf of a variable defined as A" = Fx\Y ) 
will be fx(x). 

This observation suggests a simple algorithm, very useful in those cases in which 
the cumulative function is easy to be inverted, to make a (pseudo) random number 
generator to produce numbers such that our confidence on the occurrence of X = x 
is proportional to f(x): 


X = F x '(u), 


(52) 


where u stands for the occurrence of a uniform (pseudo) random generator that gives 
numbers (apparently) ‘at random’ between 0 and 1 (random number generators of 
this kind are available in all computational environments.) 
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4.2 Application to the chord problem 

Fortunately we can apply this trick to all probability distributions of the chords found 
in the previous sections. The generic rule (l52]i gets then implemented as follows. 


Mr. 

A 

= 2 sin (ttu/2) . 

M 2 : 

A 

= 2y/2 u - u 2 . 

M 3 : 

A 

= 2 y/u. 

Mr 

A 

= 2 u . 

M 5 : 

A 

= 2VT^2^ r Tu^ 


And this is then, for example, the resulting function written in the R language [6] : 

rlchords <- function (n, meth) { 
u <- runif (n) 
switch (meth, 

2*sin(pi*u/2) , 

2*sqrt (2*u-u~2) , 

2*sqrt (u) , 

2*u, 

2*sqrt (l-2*u~2+u~4) ) 

} 

Issuing then the following instruction from an R console (’>’ stands for the prompt) 
you can for example get an histogram similar to the left top one of Fig. [20j 

> lambda=rlchords (100000 , 1) ; hist (lambda, nc=200, col=’red’) 

Or you can calculate sample mean and standard deviation, and fraction of occur- 
rences with A smaller than \/3 with these very simple commands 

> mean (lambda) 

> sd (lambda) 

> length (lambda [lambda<sqrt (3)] ) /length (lambda) 

(You will get sample mean and standard deviation very close to the mean and stan- 
dard deviation of the distribution because it is very unlikely to have something very 
different, as famous a theorem of probability theory makes us feel confident.) 
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5 What should one expect from a computer draw- 
ing program? 


“Mater artium necessitas” 

The original Bertrand problem is about drawing chords and not just telling numbers 
between 0 and 2 (taking a unitary radius). Therefore the question we have to ask is 
really to draw the chord, by hand of by a computer program^ In the introduction 
I have told what I more or less expect when I ask the practical question, providing a 
sheet of paper with a pre-designed circle, and I must say that in the last years I had 
no surprises that induced me to change the model I formed in my mind. You may 
form yours with practice. 

More recently I have also asked PhD students to write “chords generators” with 
their preferred computer language. As it easy to guess, the choice goes to the al- 
gorithm easier to implement, which for physics students is Method 1, since they 
are familiar with circular motion and with transformations from polar to Cartesian 
coordinates. 

The other methods are somehow tedious because, if taken literally , they require 
several steps with formulae not used everyday, that one needs to derive. For example 
Method 2, requires literally. 1) to choose a radius at random; 2) to chose a point on 
the radius; 3) find the equation of the line orthogonal to the radius in that point; 4) 
find the interceptions of the line with the circle. Indeed - I have to confess with some 
shame - the first time I was playing with the Bertrand problem, I was implementing in 
R these detailed procedures. When during this year course I tried to make an Android 
app to draw ‘random’ chords on a circle, using App Inventor [7], I was horrified by 
the formulae I had to ‘write’ with that tool. So I initially implemented only Method 
1. After a while I also implemented Method 2, but not following the procedures 
described above. The trick was to extract a point along the horizontal diameter, thus 
coinciding with the abscissa, and operate then a random rotation. In that way it was 
possible to reuse somehow the ‘blocks’ (the graphical programming elements shown 
e.g. in Fig. [12]) developed for Method 1. 

The rest of this section is devoted to simulation issues, showing how to avoid 
pedantic procedures and without pretending that the suggested algorithms are the 
‘best’ in some sense that should be better defined. (My suggestion to students is that 
the for everyday use the ‘fastest’ algorithm is the one that they write more rapidly 
and understand better - unless you need it for special purposes, it is a waste of time 
to spend several hours of your life to write a piece of program that provides the result 
in microseconds instead then in tens, hundreds or even thousands of milliseconds). 

11 Nevertheless, the lengths provided by the ‘chord generator’ presented in the previous section 
do provide valid answers, since the pdf’s have been derived using some geometric rules to produce 
chords and not with an abstract algorithm to produce numbers between 0 and 2, like those you 
would get e.g. with the following R command 
> n=10; lambda = 2*sin(rimif (n, 0, pi)) “2 
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5.1 Model 1 (All) 

As stated above, this is the one that appears the simplest (to implement in a program) 
to physics students, to most colleagues and to myself. Here is how it appears in R (n 
is the number of chords, that should be set in a previous command). 

> phi <- runif (n, 0, 2*pi) 

> ph2 <- runif (n, 0, 2*pi) 

> pi <- cbind(cos (phi) , sin(phl)) # [1] 

> p2 <- cbind(cos (ph2) , sin(ph2) 

> 1 <- sqrt ( (p2 [ , 1] -pi [ , 1] ) ~ 2 + (p2 [ , 2] -pi [ , 2] ) ~ 2) 

The result is a ‘vector’ 1 of n lengths of chords (in units of the radius). Plus we have 
the matrices of interception points (each raw is a point). 

5.2 Model 2 (M 2 ) 

In this case we start extracting x\ = x 2 between —1 and 1 and evaluate the corre- 
sponding ordinates on the circumference, i.e. 

> pi <- p2 <- runif (n, -1, 1) 

> pi <- cbind(pl, sqrt (l-pl~ 2) ) # [2] 

> p2 <- cbind(p2, -sqrt (l-p2~ 2) ) 

Then we define a random rotation angle and add it to the polar angles calculated 
from the points: 

> phr <- runif (n, 0, pi) 

> phi <- phr + atan2(pl [,2] , pi [ , 1] ) 

> ph2 <- phr + atan2(p2[,2] , p2 [ , 1] ) 

At this point we can reuse exactly the last three lines of code of the previous method, 
i.e. starting from the line tagged by “# [1]”. The resulting App Inventor blocks are 
shown in Fig. [T2J 

5.3 Model 3 (M 3 ) 

To chose a point uniformly inside the circle we could extract uniformly x and y 
between —1 and 1 and discard the points which are outside the circle of radius 1. 
But we can use of the previous code (or App Inventor blocks) if we extract a point 
along the radius with pdf /(p) = 2 p, as we have learned in subsection 12.31 making 
use of the technique learned in subsection 14.11 

> rho <- sqrt (runif (n) ) 

But, in order to reuse the previous code, we have to invert at random (with probability 
1/2) the sign of this numbers. Technically this can be done in R creating a vector of 
random — l’s and l’s obtained by a binomial generator and multiplying it, element 
by element, with the vector rho. Thus our starting abscissas will be 

> pi <- p2 <- rho * ( rbinom(n, 1, 0.5) *2 - 1 ) 

After this, we continue exactly as in the second line of R code of the previous method, 
tagged by “# [2] . The implementation of this variation in App Inventor is shown 

in Fig. 02J 
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B TctT ■ TTTj .>31 ‘-i Jj 1 1 



Figure 12: App Inventor blocks for the core of Method 2. (The minus sign of the y argument of 
atan2 is due to the fact that in App Inventor the y coordinate inside a ‘canvas' is upside down, 
being the origin in the top left corner, while the ‘heading’ angle follows the usual convention. 
Note also how, contrary to other scientific libraries, the trigonometric functions use degrees.) 
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Figure 13: App Inventor blocks to turn Method 2 into Method 3 (see text). 

5.4 Model 4 (.M4) 

Also in the case of the fourth method, we can reuse the code written for Method 2, 
without having to calculate the intersections of two circles. In fact the first endpoint 
is (-R, 0), while the second can be easily found using elementary geometry. With the 
help of Fig. [14] we can recognize a useful rectangular triangle and then make use of a 
famous theorem of geometry. The projection p is then l 2 /2R and then the abscissa x 2 
of the intersection is equal to X 2 = p — R = l 2 /2R — R. The corresponding ordinate is 
therefore y 2 = yj R 2 — x 2 . Then we can rotate ‘at random’ the points as done above, 
when we described Method 2. These are the two lines of R code that replace the first 
line of code above 

> pi <- rep(-l, n) 

> p2 <- runif(n, 0, 2)~2/2 - 1 



Figure 14: A sketch which show how to calculate the abscissa of the the interception (x) by 
simple geometry (see text). 
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and from the second line, tagged by “# [2] ” , it will be all the same. 

5.5 Model 5 (M 5 ) 

The difference with the previous model is that now the abscissa of the two points 
(before rotation!) are the same, while the ordinates are one the opposite opposite of 
the other, as in Model 2. We have then 

> p2 <- pi <- runif(n, 0, 2) "2/2 - 1 

and then we continue from the second line of the code of Model 2, tagged by “# [2] ” . 



Figure 15: Mikado sticks thrown at random on a pattern of circles. 


6 A physical model: throwing mikado sticks on a 
pattern of circles 

The authors of Ref. [5] claim to have found a ‘conclusive physical solution’ to the 
Bertrand problem, but I have strong doubts that they have ever tried to implement 
their model in a real experiment. Something which seems to me more realistic is the 
kind of game sketched in Fig. [131 a pattern of circles^! on a table, or on the floor, on 

12 In Fig. [15] all circles have the same size, but this is not a necessary requirement, as it will be 
clear in a while. 
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Figure 16: Sketch of the mikado experiment to show how to evaluate the distance of the chord 
from the center from the position of the center of the stick and its orientation (see text). 


which we throw mikado sticks, or toothpicks, or needles or something similar. The 
only, obvious, conditions is on the minimum length of the sticks, that has to be at 
least twice the maximum diameter of the circles] 13 ! 

If we throw ‘ad random’ the sticks, somehow towards the center of the pattern 
in order to avoid complications with boundary conditions, we expect their centers 
and their orientations ‘uniformly’ distributed (the former in the plane, the latter 
in angle w.r.t. a given direction). We consider only sticks whose reference point, 
marked somehow, is inside one circle and consider the resulting chords defined by the 
intersection of the stick with the circumference of that circle. As we can see from 
Fig. [T5J the efficiency is rather high (and, as a byproduct, we can try to estimate 
empirically the area of the regions between circles, but this is a different story. . . ). 

Evaluating the pdf of the expected chords length might be complicated, but for- 
tunately we can make use of some of our previous results. Let us indicate now with 
ro the distance from the center of the stick to the center of the circle and with r 
the distance of the chord from the center of the circle (see Fig. fl6l) . The respective 
quantities in units of R are then p 0 = r 0 /R and p = r / R. 

By the hypothesis inherent to the throwing mechanism the center of the stick 
is uniformly distributed in the circle. This implies, as we already know, that high 
values of p 0 are more probable than small ones, namely f(p 0 \A4 6 ) = 2 p 0 . The 
fractional distance r is related to ro by r = rosin a, where the angle cc, defined in the 
construction in Fig. [T6l is uniformly distributed between 0 and 7t/2. These are then 

13 Also this condition is not necessary, if we think to prolong the stick in either end by a ruler 
to draw the chords. And, finally, it is not even required that the reference point, needed to decide 
inside which circle the chord has to be drawn, has to coincide with the center of the stick. The 
formulation in the text is, or at least so seems to me, the easiest to be implemented in a real ‘game’, 
similar to the famous “Buffon’s needle” . 
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our starting conditions!^! 



p 

= po sin a 


(53) 

f(Po 

1 Me) 

— 2p 0 

(0 < Po < 1) 

(54) 

/(« 

Me) 

1 2 

7t/ 2 7 r 

(0 <a<|). 

(55) 


We can then calculate the probability distribution of p and make then use of the 
reasoning applied in Method 3 to evaluate the probability distribution of A. Indeed, 
for a given p 0 , the pdf of p, conditioned by the value of p 0 , will be given by 


rn/2 

f(p\M 6 ,p 0 ) = / S(p — p 0 sin a) ■ f (a) da 

Jo 

r n / 2 5(a - a*) 2 

= / — da , 

Jo p o cos a* 7 r 

with a* = arcsin(p/p 0 ), from which it follows 

f(p\Me,Po) = 


*PoJl ~ (p/pa) 1 
2 


~ dtp- p‘ 


(56) 

(57) 


(58) 

(59) 


Having /(p | A4 6 , p 0 ) and /(p 0 | Me) we can then evaluate /(p | Me) as 

/(p I Mq) = [ f(p\M 6 ,p 0 ) ■ f{Po\Me)dp 0 , (60) 

Jp 

in which we have to pay attention to the condition p < po- The result is 

f(p\M 6 ) = 


Having calculated the pdf of the distances of the chords from the center of the circle, 
we continue as in Eq. ( 127 )) . thus obtaining 

/(A | Me) = — 2y / T^p2^ • ^ ^l^fj 2 dp (64) 

14 An alternative way would be to use the angle /3 defined in Fig. 1161 ranging between 0 to 7 r, with 
/(/3 | Me) = 1/tt. The angle a inside the rectangular triangle will be equal to /3 if (3 is smaller than 
7t/2, and 7 r — /3 elsewhere. The relation (l53l) would then be replaced by p = po sin/3 and the integral 
(1561) replaced on the equivalent one in d/3 between 0 and 7r, with the factor 2/tt in the integrand 
replaced by 1/n, apparently leading to results differing by a factor of 2. This apparent contradiction 
is resolved noted that transformation rule of the Dirac delta has now two roots, /3 1 = arcsin(p/po) 
and /3| = 7r — arcsin(p/po)- But, since | cos/3J| = | cos /3||, we have two identical contributions, thus 
exactly compensating the missing factor 2. 


K 7T y/pO ~ P 2 
f 1 4p 0 


• 2po dpo 
dpo 


P TTVPo-P 


- v 1 _ p 2 • 

7T v 


(61) 

(62) 

(63) 
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5 (p - p*) 


( 65 ) 


Jo 2p 2 /v / r^p I 

4/tT y/l -p* 2 _ 


' - \A - P 2 dp 

'7 r ’ 


p=p 


2 1 - p 


2p*/Jl — p* 2 71 P 


*2 


with the usual p* = \J 1 — (A/2) 2 . We get finally 


/(A | M 6 ) 
F(X\ M 6 ) 


2 (A/2) 2 

n /hiw 

— arcsin (A/2) a/ 1 — (A/2) 2 , 


7T 7T ’ 

shown in Fig. [IT] and from which we can calculate the usual indicators 


E(A | Me) 
o-(A| M 6 ) 
P ( A < V3\ M 6 ) 
P( A < l|Ms) 


16 
3 7T 


1.70 



2 ^3 


3 2 7T 

1 73 

3 _ 27 


« 0.34 

0.39 

0.058. 


6.1 Remarks on simulation 


( 66 ) 

(67) 

( 68 ) 

(69) 

(70) 

(71) 

(72) 


We have seen in section |4] how to write chord length generators for the different 
methods without having to go through the steps of drawing the chords, while we 
have learned in section [5] how to reuse the code to draw the chords. In this case the 
situation seems a bit more complicate, but there are fortunately some ways out. 


6.1.1 Random chord length generator 

The generators described in subsection 14.21 were based on the trick of inverting the 
cumulative distribution. However Eq. Q68j) is not easily invertible and we cannot use 
that trickf7 We could then go one step behind, and try to generate the values of p, 
normalized distance of the chords from the center, and then calculate, from each p, 
the corresponding A. But also in this case the cumulative distribution of interests is 
not invertible, being equal to 

F(p,\M 6 ) = ^ ^p yjl — p 2 + arcsin pj . (73) 

We could make use of the hit/miss method, simply introduced with the help of Fig.fTHl 
The left plot shows (black curve) the pdf /(p) inside a rectangle defined by the range 

15 One might think to use the hit/miss method, described in the text in the following lines. But 
this is also not possible to be used since /(A) is divergent for A — > 2, although its integral is finite. 
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Figure 17: Probability distribution of A = l/R of the chords generated with Method 6 
(‘mikado'). The dashed vertical line indicates A = \/3- 
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p p 

Figure 18: Extracting p according to its pdf using the hit/miss Monte Carlo technique. (But 
we shall make direct extractions inside the quarter of circle! See text.) 

of p and the range of its pdf. We then throw ‘at random’ points uniformly inside 
the rectangle and mark in green those which fall between /(p) and the abscissa, in 
red those which fall ‘outside’. We draw then some vertical lines to identify ‘slides’ 
in p. We see that the number of the green point in each slice is proportional to 
/(p) calculated in the middle of the slide. We can imagine then the slides very thin, 
with the usual procedures done in calculus, to convince ourselves that the probability 
that a green point falls inside a slide is proportional to /(p). This is a well known 
technique to make extractions according to a given pdf, at the expenses of some 
inefficiency, which in our case is tolerable (just the fraction of red points over the 
total). It becomes instead not tolerable if the pdf is very peaked somewhere and 
assumes very small values in the rest of the range of the variable. Or it becomes 
impossible to be used when the pdf diverges and the rectangle become infinite high, 
as it would be with /(A | A4 6 ). 

To conclude, in our case the hit/miss technique would be appropriate, but we 
can do even better, if we observe that the factor 4/7T in Eq. (I63]i is irrelevant for 
the reasoning. If we drop it, we are left with yT — p 2 , which describe a circle in 
the Erst quadrant, as shown in the right plot of Fig. [TS1 with the ordinate indicated 
now by h(p), being a generic ‘height’ and not a pdf. But we already know how to 
extract points uniformly inside a circle without having to throw points in the square 
circumscribed! What we need is just to limit the extraction in a quarter of the circle, 
and this can be easily achieved limiting the polar angle between 0 and 7t/2. Here is 
directly the R command to generate a single value of p, followed by the corresponding 
value of A 

> rho=sqrt (runif (1) ) *cos (runif (1 ,0,pi/2) ) 

> lambda <- 2 * sqrt(2 - rho~ 2) 
and finally in a single step, with n values: 
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> lambda <- 2 * sqrt(2 - runif (n) *cos ( runif (n,0,pi/2)~ 2 ) 

Here is then the new version of our generator with all six models implemented: 

rlchords <- function(n, meth) { 
u <- runif (n) 
switch (meth, 

2*sin(pi*u/2) , 

2*sqrt(2*u-u~2) , 

2*sqrt (u) , 

2*u, 

2*sqrt (l-2*u~2+u~4) , 

2*sqrt (l-u*cos (runif (n, 0, pi/2))~2)) 

> 

6.1.2 Random chord drawer 

Sampling p is the key to simulate chords inside the circle selected by the mikado 
reference point, allowing us to reuse the code written for other methods. We can 
then easily extend what we have done in section 0 All what we have to do is to 
replace in the code for Model 3 (subsection 15 .3j) the line 

> rho <- sqrt (runif (n) ) 
by 

> rho <- sqrt (runif (n, 0, 1) ) *cos (runif (n, 0, pi/2)) 

and the game is over. Results from simulations are shown in Figs. ITTiKTl 

7 Conclusions 

Arguing, in abstract terms, about the solution of the Bertrand problem - not a para- 
dox! - is as scientific as the Byzantine debates about the sex of angels. Nevertheless, 
the question can still have a sense if framed in a practical contest, asking people to 
draw, by hand or with the help of computer graphics, a chord and betting on the 
result. 

Several random drawing methods have been analyzed in great detail, in order 
to show several issues related to the evaluation of the probability distributions of 
functions of variables whose pdf was assumed and to the drawing simulation. 

Of the six methods analyzed, five can be classified as ‘geometrical’, two of which 
are the ‘classical’ solutions. There was no intention to invent abstruse methods and 
indeed the two “with ruler and compass” algorithms seem to me even more ‘natural’ 
(or at least more suited to a class of people) that the more famous, ‘classical’ Method 
3. 

Finally, the only sound experiment I could think about (but perhaps I miss of 
fantasy) leads to a solution different than the claimed ‘conclusive’ physical solutions 
of the ‘paradox’. mu 

— II faut cultiver notre jardin — 
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Figure 19: Samples of chords generated the various methods. 
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Figure 20: Distribution of the length of the chords in sample produced with the various 
methods. 
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Figure 21: Sample of centers of the chords generated with the various methods 
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Figure 22: Distribution of the distance of the chords from center of the circle in samples 
produced with the various methods. 
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Figure 23: Samples of chords generated the various methods with respect to a preferred axis 
(“without rotation” - see text). 
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Appendix A: Distributions of chord distances from 
the center of the circle 


Out of the six methods analyzed above, one (At 2) is defined in terms of the distance 
between chord and center of the circle, while in two others (M3 and M.q) we have 
learned that that distance is a useful quantity to simplify the solution of the problem. 
The three pdf’s of the normalized distances in these three cases are 


f(p\M 2 ) = 1 
f(p\Ms) = 2 p 

f{p\M 6 ) = - /L -p 2 , 
7 r v 


(74) 

(75) 

(76) 


while we have shown in Fig. [22] the results of the simulation for all six methods. 
For completeness, let us do the exercise to calculate the pdf /(p) also for the other 
three methods. This is in fact easy, at least in principle, because there a simple 
geometric relation between the length of a chord and its distance from the center, 
since the chord and the radii to its endpoints identify an equilateral triangle. The only 
technical problem is that of getting a closed expression for the pdf. Let us remind 
here the relations between length A of the chord and distance p from the center, both 
in units of the radius. 


P = r 


\ = 2J1 -p\ 


(77) 

(78) 


which we shall use in the following subsections. I11 fact, using our transformation rule 
with the Dirac delta, f(p) is given by 


/(p|M) = jX (p-yj 1 - (A/2) 2 ) • /(A | Mi) dX 
4 v /l-(A*/2) : 


(79) 


7W 1 - (V2) 2 




\=x* 


X* 


■ /(A* | Mi) 


2 P 


/(A* | Mi) 


being A* = 2 y/1 — p 2 . 

Here are the results for the remaining three models: 

= 2l> 1 


(80) 


2 P 

VI -P 2 

7 r/ 

2 P 

1 

W-P 2 

7 r p 

2 1 

,2 


(81) 


( 82 ) 
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Figure 24: Distribution of the distance of the chords from center of the circle. (The order of 
the models in the legend corresponds to the decreasing order of the pdf's for small values of p.) 


f(p\M 4 ) 


f(p\M s ) 


2 p 1 

2 


vT 

2 p \jl + 2 >/l - (A*/2)* +yj2-2yfl- (A*/2) 2 
V 1 “ P 2 4 x 2\Jl — (A*/2) 2 

2 p y/2 + 2 p + i/2 — 2 p 

\/l - p 2 8p 

\/2 + 2p + ^/2 — 2 p 

4 i/l — P 2 


(83) 

(84) 

(85) 

( 86 ) 
(87) 


where the form y4 — A* 2 have been written as 2yl — (A*/2) 2 , in which we easily 
recognize 2p. 

The results are summarized in Fig. [24j As it happened for the pdf’s of A, some 
of them are divergent for p — y 1, but it has been checked that all integrals from 0 to 
1 are finite and indeed yield 1, as it must be. 
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Appendix B: A chord length generator based on the 
Beta distribution 


If it makes no sense to talk about the ‘correct’, or even the ‘best’, solution in abstract 
terms, it has also little sense to make an inventory of all possible solutions. The 
reason on the six ones shown here, let us repeat it again, was the following: 

• models Mi, M2 and M3 are the ‘classical’ ones, i.e. those proposed by Bertram 
himself; 

• model M4 is as simple as the other three (and certainly, in my opinion, much 
more natural to think about than M3), justified by the custom of a ruler and 
compass geometer; 

• model M5 is a variation of M 4, in which the endpoints of the chord are defined 
by the two intersections of the pencil lead when the compass is rotated clockwise 
and anticlockwise; 


• finally, model Me was an attempt to think about of a chord drawing game, in 
analogy to Buffon’s needle. 


In all cases (even in the sixth one, that could be indeed played in practice), in order 
to refer to real cases, the question has been turned into the result of a generator 
that a ‘student’ (or anyone else who can write a computer program) might write. 
And, although at the very beginning one would mainly think about to Adi, and 
perhaps M 2 , once the ‘student’ starts thinking about the problem and learns how 
to reuse programming code, we have to be afraid (always think to bets!) that more 
sophisticated extraction models could be implemented. 

Finally, once the student realizes that what matters is the distribution of the 
normalized distance, shown in the previous appendix, the number of methods which 
can be easily implemented immediately diverges. One only needs an easy generator 
of a quantity in the range between 0 and 1. The easiest possibility which offers quite 
some variability in shapes is provided by the Beta distribution, defined as 


f (x | Beta(r, s)) = 1 x r ^l-x) 5 1 

P{r,s) 


r, s > 0 
0 < x < 1. 


( 88 ) 


in which the denominator, equal to 

/3(r,s)— f x r ^ 1 (l — x) s_1 dx , 

Jo 

defines the beta function. Examples of the beta distributions are shown in Fig. [23 
(I refrain from showing the resulting chord distributions. . . ). Here it is just how a 
chord length generator could be implemented in R: 

> rlchordsBeta <- function(n, r, s) 2*sqrt (l-rbeta(n, r, s)~2) 

And here it is how to recover very easily the simulations with Methods 2 and 3: 

> n=100000; hist (rlchordsBeta(n, 1, 1), nc=200, col =;, blueO 

> hist (rlchordsBeta(n, 2, 1), nc=200, col =:i green’ ) 

Other fancy distributions are left to the fantasy of the reader. 
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Figure 25: Examples of Beta distributions for some values of r and s. The parameters in 
bold refer to continuous curves. 


44 




Appendix C: Turning the Bertrand problem into an 
inferential-predictive game 

“I play at ecarte with a gentleman whom I know to be perfectly honest. 

What is the chance that he turns up the king? 
It is 1/8. This is a problem of the probability of effects. 

I play with a gentleman whom I do not know. 
He has dealt ten times, and he has turned the king up six times. 

What is the chance that he is a sharper? 
This is a problem in the probability of causes. 
It may be said that it is the essential problem of the experimental method ” 

(Henry Poincare) 

The Bertrand problem can be rephrased in the terms of the causes and the effects 
of the above quote. If we assume a cause (a precise random drawing model) we are 
uncertain about the effects (the chord length). But we might also be uncertain about 
the model. And usually we do not consider all possible models equally likely, where 
“all model” only means “all models of which we can think about”. Usually there will 
be models in which we believe more and models in which we believe less, and we then 
rank the possibilities with a degree of belief, or probability. 

As a consequence, the uncertainty about the occurrence of each possible effect 
(the length of a chord, rounded somehow) has to take into account the probability 
of each value of the length, given the model, and the probability of the model itself. 
This is done using the following well known rule of probability theory: 

/(A | /) = y y f(\\M h I)-P(M i \I), (89) 

i 

where the background information / has been made here explicit to remind that prob- 
ability is always conditional probability. A similar formula applies to the probability 
that the length of the chord is smaller than any given value, that is 

F(\ 1 1) = £F(A|M,/)-P(M:|/). (90) 

i 

In both cases we have a weighted average, with weights equal to the probabilities 
of the models. Figure Qj] shows with dashed lines the case in which the six models 
are initially considered equally likely. In particular, for the original Bertrand 
question we have, in units of the radius, 

P(X<VS\I) = ^P(A< V3\Mi,I) -P(Mi\I). (91) 

i 

Let us now think to a different question . We assume we know a chord length, 
generated by one of the possible models. Our problem does not concern any longer 
the length of that chord, assuming we are happy with the provided precision, but 
rather what algorithm was used to generate it. A problem that Poincare would have 
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classified as the essential problem of the experimental method is then to guess , in 
quantitative terms, which model was used. 

Let us imagine, for example, that the extracted chord has a length, in our custom- 
ary units of the radius, of 0.1, that is Ai = 0.1. Analyzing Fig. [TT]we tend to exclude 
with great certainty models 2, 3 and 6, attributing instead the result most likely to 
M 4 . But we would not believe this model much more than Ad 1 or AI5. If instead 
we had (very) good reasons, as we have learned, to exclude Ad 4 and Ad 5, we were 
practically sure that Ai = 0.1 came from Adi- As a consequence, our expectation 
about the length of a second chord will be that resulting from model Adi preferred 
by the experimental evidence. 

Imagine instead the ‘curious’ (but not unlikely) situation in which: model 6 was 
for some reasons excluded; we considered the other five models initially equally likely; 
the result was A2 = 1.4. In this case the experimental information would practically 
irrelevant, and this data will not update our opinion concerning the drawing model 
(see Fig. [TT| to figure out the reason). 

If we continue the extractions (always using the same generator!) we keep up- 
dating the probability of the different models and the probability density function of 
the length of the next chord extracted, or, to make the game simpler, the probability 
that the next A will be equal or smaller than \/3, that we recalculate each time from 
Eq. (l9Tj) . Put in this terms the problem is similar to the six box one discussed in 
Ref. |8j, each box having a different content of black and white balls. Indeed we have 
even the same number of ‘causes’. The main difference is that, instead of having 
only two outcomes (black and white), we have now all possible values between 0 and 
2, although discretized. This discretization yields to another important difference. 
While in the six box problem some of the causes could eventually be ‘falsified’, i.e. 
their probability could become exactly zero (a black ball cannot come from a box 
containing only white balls, and the other way around), in the chord problem falsifi- 
cation is impossible!^ Given the similarity of the inferential and predictive games I 
refer then to [H] for a mini introduction to probabilistic inference needed to analyze 
our problem. 

Probability theory teaches us how to update the odds, i.e. the ratio of probabili- 
ties, with a rule that it is convenient to rewrite in our case as 

P{Mi I A„, I) P{ Xn I Mi, I) w P(Mi I An-!, /) , , 

P(Mj | An, I) P{K | M3, I) X P(Mj | An-!, /) ’ 1 J 

where A n is the length at the n-th extraction, starting from n = 0, that is ‘no 
extraction’ and then P(Mi | Ao, /) stand for the priors. As we see in (1921) . the posterior 
from the n-th inference becomes the prior of the (n + l)-th. Moreover, we are not 
considering the pdf of A, but the probability to obtaining a number from our chord 
generator, rounded somehow. For example, if we round at two significant digits and 

16 Some of the pdf’s go to zero for A — > 0. However the probability in a finite interval around zero 
is different from zero because the pdf are larger than zero for A > 0. 
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(93) 

(94) 


we get Ai = 1.23, the probabilities of interest will b 

r-1.235 


P(Xi | Mi, I) = / /(A | Mi, I) d\ 

J 1.225 

= F(1.235| M u I) -F(1.225| Mi, I) 


In this case we shall make the problem more realistic and avoid the singularities of 
some of the pdf’ (and also be more precise in the calculation of the probability, in 
the region in which the pdf’s are not enough linear). 

This is the R code we need to calculate the cumulative distributions for the 
different models 

F. lambda <- function(l, mod) { 
if (1<= 0) 0 
else if (1>=2) 1 
else switch(mod, 

2/pi*asin(l/2) , 

1 - sqrt (l-l~2/4) , 
l“2/4, 

1 / 2 , 

(2+sqrt (2-sqrt (4-l~2) ) -sqrt (2+sqrt (4-l~2) ) ) /2 , 
2/pi*asin(l/2) - l/pi*sqrt (l-l~2/4) ) 

> 

(Note how the functions reports 0 for arguments smaller than 0, and 1 for arguments 
larger than 2). 

Then we need a function to calculate the odds between the different hypotheses. 
We can calculate them with respect to a reference one, from which all others can be 
easily calculated. Here it is (the default reference model was chosen the one producing 
a flat distribution of A, but, as we shall see, the precise default value is irrelevant): 

chordsModelsBF <- function (1, last . digit=0 . 01 , ref.mod=4) { 
bf <- rep(l,6) 
hid <- last. digit/2 
for (mod in c(l:6)) { 

bf [mod] = (F. lambda (1+hld, mod) - F. lambda(l-hld, mod)) / 

(F . lambda(l+hld, ref .mod) - F . lambda (1-hld, ref .mod)) 

} 

return (bf) 

} 

As we see from the list of the arguments, we have to pass also the last digits. The 
functions return a vector of values. For example 

> chordsModelsBF (1 . 23) 

17 This rule can be applied also at the edges, since the pdf is null outside the range of the variable, 
as we shall see in the R code, in we shall pay attention that the cumulative function is 0 for A below 
0 and 1 for A above 2. 
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[1] 0.8073569 0.7799414 1.2300000 1.0000000 0.8058272 0.6107326 
in which we see that this value provides a slight evidence in favor of .M3, as clear 
from Fig. [IT] while 1.95 would favor, in the order, and .M5, while M.± is the less 
favored: 

> chordsModelsBF(l . 95) 

[1] 2.868580 4.393492 1.950000 1.000000 3.166248 5.454361 

Let also try to extreme values, leaving the evaluation of the results to the reader 
(results are rounded to facilitate the reading). 

> round (chordsModelsBF (0) , 7) 

[1] 0.6366204 0.0012500 0.0025000 1.0000000 0.5003129 0.0000027 

> round (chordsModelsBF (2) , 1) 

[1] 18.0 28.3 2.0 1.0 20.0 36.0 

At this point we are ready to make the simulations. This is the the rest of the 
code to make the extractions and to update accordingly the various probabilities: 

rmod <- sample (1 : 6) [1] 

odds <- matrix(rep(l , 6), c(l,6)) 
probs <- matrix(rep(l/6, 6), c(l,6)) 
bf <- matrix (rep (NA, 6), c(l,6)) 

pLEsqrt3.Mi <- c(2/3, 1/2, 3/4, sqrt(3)/2, (3-sqrt (3) ) /2 , 2/3-sqrt (3) / (2*pi) ) 
pLEsqrt3 <- sum(probs [1 , ] *pLEsqrt3 . Mi) 
fLEsqrt3 <- NA 

n <- 200 

decimal . digits <- 2 

last. digit <- 10~-decimal . digits 

1 <- NA 

for (i in 2:n) { 

1 [i] <- round (rlchords (1 , rmod), decimal . digits) 

#print (ref .mod) 

bf <- rbind(bf, chordsModelsBF(l [i] , last. digit, ref .mod) ) 

odds <- rbind(odds, odds [i— 1 ,] *bf [i ,] ) 

ref. mod <- which (odds [i ,] ==max (odds [i ,])) [1] 

odds[i,] = odds[i,] / odds [i , ref .mod] 

probs <- rbind(probs, odds[i, ] /sum(odds [i ,] ) ) 

pLEsqrt3[i] <- sum (probs [i ,] *pLEsqrt3 . Mi) 

fLEsqrt3[i] <- length(l [l<=sqrt(3)] )/i 

} 

Then this is how to plot the histories of the probabilities of the models: 

plot (probs [, 1] , ylim=c(10~-6, 1) , xlab^’, ylab=expression(P(M [i] ) ) , 
log^y’, colored’, cex=0.9) 
points (probs [, 2] , col^blue’, cex=0.9) 
points (probs [, 3] , col^ green’ , cex=0.9) 
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points (probs [, 4] , col=’ magenta’ , cex=0.9) 
points (probs [, 5] , col= ’ orange’ , cex=0.9) 
points (probs [ , 6] , col=’cyan’, cex=0.9) 

Another interesting plot concerns the probability that the next chord will have A < 
\/3, which was stored, step by step, in the vector pLEsqrt3, compared to the relative 
frequency of occurrence of such a condition in the previous steps: 

plot(pLEsqrt3, ylim=c(0,l), col=’black’ , 

xlab=’n’, ylab=expression(P(l<TrS) ) , cex=0.7) 
points (fLEsqrt3, ylim=c(0,l), pch=4, cex=0.7, col=’ gray’) 

The results are shown in Fig. fZdlfiTTl with the models identihed by the same color 
code used in the previous figures. In particular we see that there are model easy 
to be identihed and other more difficult, as it can be understood from the plots in 
Fig. [HJ As far as the comparison between the probability of the next effect calculated 
from probability theory (black circles) and that evaluated from the past frequency 
(gray), we see that the former is more stable and rapidly converging to the correct 
one, corresponding to the model about which we become practically sure. 
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Figure 26: Probability of ‘causes’ (above) and probability of an ‘effect’ (below), the last 
compared with the relative frequency of occurrence (see text). [True model: M.\\ 
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Figure 27: Probability of ‘causes’ (above) and probability of an ‘effect’ (below), the last 
compared with the relative frequency of occurrence (see text). [True model: M.o ] 
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Figure 28: Probability of ‘causes’ (above) and probability of an ‘effect’ (below), the last 
compared with the relative frequency of occurrence (see text). [True model: M 3 ] 
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Figure 29: Probability of ‘causes’ (above) and probability of an ‘effect’ (below), the last 
compared with the relative frequency of occurrence (see text). [True model: M 4 ] 
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Figure 30: Probability of ‘causes’ (above) and probability of an ‘effect’ (below), the last 
compared with the relative frequency of occurrence (see text). [True model: M.§\ 


54 



n 


Figure 31: Probability of ‘causes’ (above) and probability of an ‘effect’ (below), the last 
compared with the relative frequency of occurrence (see text). [True model: 
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